Skip to content

[Bugfix][plugin] fla crash on plugin#27322

Merged
mgoin merged 1 commit intovllm-project:mainfrom
ILikeIneine:fix-fla-crash-on-plugin
Nov 3, 2025
Merged

[Bugfix][plugin] fla crash on plugin#27322
mgoin merged 1 commit intovllm-project:mainfrom
ILikeIneine:fix-fla-crash-on-plugin

Conversation

@ILikeIneine
Copy link
Copy Markdown
Contributor

@ILikeIneine ILikeIneine commented Oct 22, 2025

Purpose

There's some problem while supporting fla on plugin.
While importing the fla/ops/utils, it crashed on here.

Since in plugin the device might got their own value (here in vllm-metax is maca) and device_torch_lib still need to be their own. (here in vllm-metax is torch.cuda)

So I use is_cuda_alike and set default value to None on getattr to make some compatibilities for handling the corner cases. The semantics should be consistent with the original code.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@ILikeIneine ILikeIneine changed the title fix fla crash on plugin [Bugfix][plugin] fla crash on plugin Oct 22, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a crash in Flash-Linear-Attention (FLA) operations when used with plugins. The changes in vllm/model_executor/layers/fla/ops/utils.py are well-reasoned and effective. By leveraging current_platform.is_cuda_alike(), the code now correctly identifies CUDA-compatible platforms (including plugins) and sets the device library appropriately. Adding a None default to getattr is a good defensive measure that prevents crashes on other platforms like CPU, making the utility more robust. The fix is correct and improves the overall stability of FLA operations in diverse environments.

Copy link
Copy Markdown
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks fine but I don't have the context on fla.
Perhaps @youkaichao can take a quick look at it.

@ILikeIneine ILikeIneine force-pushed the fix-fla-crash-on-plugin branch from 873d0a9 to b483cc9 Compare October 24, 2025 02:23
@mgoin mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 1, 2025
Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks simple enough to me. I believe the logic is kept the same for Nvidia and AMD, so nothing else changes for Intel or CPU

Signed-off-by: Hank <hcc.mayday@gmail.com>
@ILikeIneine ILikeIneine force-pushed the fix-fla-crash-on-plugin branch from 59e4598 to e379c89 Compare November 3, 2025 02:08
@mgoin mgoin merged commit ccd3e55 into vllm-project:main Nov 3, 2025
47 checks passed
ZhengHongming888 pushed a commit to ZhengHongming888/vllm that referenced this pull request Nov 8, 2025
ILikeIneine added a commit to MetaX-MACA/vLLM-metax that referenced this pull request Nov 11, 2025
related: vllm-project/vllm/pull/27322

Signed-off-by: Hank <hcc.mayday@gmail.com>
ILikeIneine added a commit to MetaX-MACA/vLLM-metax that referenced this pull request Nov 14, 2025
* support platform and remove kernel copy

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update pre-commit

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update version and requirements

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update flashinfer

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update build requirements

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update attention backends

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update patch

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update quant_method

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update fuse_moe (todo: fix mypy)

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update `deepseek_v2.py`(todo: fix indexer kernel)

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [feat] support bf16 cp_gather_indexer_k_cache kernel

Signed-off-by: Xin Li <lixin1620@gmail.com>

* [fix] fix type error in bf16_paged_mqa_logits

Signed-off-by: leex404 <lixin1620@gmail.com>

* [feat] add topk logits ops

Signed-off-by: leex404 <lixin1620@gmail.com>

* [fix] private memory size too large in `sample_recovered_tokens_kernel` (#115)

* [fix] fix sample_recovered_tokens_kernel use too much private memory

Signed-off-by: Xin Li <xin.li@metax-tech.com>

* [fix] fix type error in bf16_paged_mqa_logits

Signed-off-by: Xin Li <xin.li@metax-tech.com>

* [chore] change file directory

Signed-off-by: Xin Li <xin.li@metax-tech.com>

---------

Signed-off-by: Xin Li <xin.li@metax-tech.com>
Co-authored-by: Xin Li <xin.li@metax-tech.com>

Signed-off-by: leex404 <lixin1620@gmail.com>

* [fix] fix missing topk logits custom ops definition

Signed-off-by: leex404 <lixin1620@gmail.com>

* [fix] add custom gptq_shuffle ops

Signed-off-by: leex404 <lixin1620@gmail.com>

* [fix] fix compile error

Signed-off-by: leex404 <lixin1620@gmail.com>

* platform config update

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update qwen2.5_vl model

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [fix] fix torch not found maca device

Signed-off-by: leex404 <lixin1620@gmail.com>

* remove hotfixes patch for torch2.8

Signed-off-by: Hank <hcc.mayday@gmail.com>

* remove needless patch

related: vllm-project/vllm/pull/27322

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [feat] topk_softmax support renormalize and bf16

Signed-off-by: leex404 <lixin1620@gmail.com>

* [fix] update fused_moe to fit v0.11.1

Signed-off-by: leex404 <lixin1620@gmail.com>

* [fix] fix fused moe config log missing

Signed-off-by: leex404 <lixin1620@gmail.com>

* use flash_attn as vit attn backend on qwen_vl

Signed-off-by: Hank <hcc.mayday@gmail.com>

* update quant_conf registry

Signed-off-by: Hank <hcc.mayday@gmail.com>

* fix and apply latest pre-commit of v0.11.1

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [feat] Keep all AITER kernels in _aiter_ops

Signed-off-by: leex404 <lixin1620@gmail.com>

* fix pre-commit on type casting

Signed-off-by: Hank <hcc.mayday@gmail.com>

* [fix] fix DeepSeek import error

Signed-off-by: leex404 <lixin1620@gmail.com>

* [feat] update deepseek_v2 to fit v0.11.1

Signed-off-by: leex404 <lixin1620@gmail.com>

---------

Signed-off-by: Hank <hcc.mayday@gmail.com>
Signed-off-by: Xin Li <lixin1620@gmail.com>
Signed-off-by: leex404 <lixin1620@gmail.com>
Co-authored-by: Xin Li <xin.li@metax-tech.com>
Co-authored-by: leex404 <lixin1620@gmail.com>
Co-authored-by: leex404 <42941760+leex404@users.noreply.github.com>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants